There is a mathematical theorem about self-modifying agents that state an agent will not self modify to invalidate it’s utility function, because if the agent does that, the modified agent will not maximize the current agents utility function.
That’s not correrct. There is no such “mathematical theorem”.
Indeed we know that some agents will wirehead, since we can see things like heroin addicts, hyperinflation and Enron in the real world.
Sorry, I notice you’ve had this argument at least once before. That’ll learn me to shoot my mouth off. In my defense, the wiki just says “[utility functions] do not work very well in practice for individual humans” without any mention of this fact.
However, I’m still not certain that you can take heroin addicts as proof that some agents self-modify to invalidate their utility functions.
That’s not correrct. There is no such “mathematical theorem”.
Indeed we know that some agents will wirehead, since we can see things like heroin addicts, hyperinflation and Enron in the real world.
Humans don’t have utility functions, though.
Edit: Oops. Apparently they do.
See: Any computable agent may described using a utility function.
Sorry, I notice you’ve had this argument at least once before. That’ll learn me to shoot my mouth off. In my defense, the wiki just says “[utility functions] do not work very well in practice for individual humans” without any mention of this fact.
However, I’m still not certain that you can take heroin addicts as proof that some agents self-modify to invalidate their utility functions.